DOC: AI Usage Policy (draft for discussion) by aladinor · Pull Request #363 · openradar/xradar

aladinor · 2026-04-20T17:35:04Z

Summary

Opens for community discussion an AI Usage Policy for xradar, adapted from xarray's AI Usage Policy with two xradar-specific additions.

This is a draft so the whole community can chime in before we merge. Continues the discussion started in #354.

What's in the draft

docs/ai_policy.md — full policy (attribution to xarray in the preamble).
CONTRIBUTING.md — short pointer section between "Types of Contributions" and "Get Started!" linking to the full doc.
docs/index.md — new ai_policy entry in the toctree.
docs/history.md — Development entry (PR-number placeholder to update once this lands).

Structure

The policy mirrors xarray's structure so folks coming from that ecosystem find it familiar:

Core Principle: Changes — you are responsible for every line.
Core Principle: Communication — PR descriptions and review replies must be your own words.
Code and Tests
- Review Every Line (with Not-Acceptable / Acceptable examples).
- Prefer Small PRs and Open an Issue First — framed by review burden, not line count. A 100-line AI-generated change can be just as hard to review as a 2,000-line one. Strongly encouraged: open an issue before any non-trivial AI-assisted contribution so maintainers can validate scope and structure before any code is written.
- CI, Packaging, and Dependency Changes — new section. GitHub Actions, dependencies, pyproject.toml / environment.yml / pre-commit config, and security-sensitive areas require an issue-first discussion; AI is not a reliable guide to the security or maintenance implications of a new dependency. Raised by @zssherman in MNT: Improve Contributing Guide #354.
Documentation — names xradar's domain specifics (CfRadial2/FM301, WMO Manual on Codes, format ICDs).
Disclosing AI Usage — new section. Recommends (not requires) noting the tool/model and version in the PR description when AI was used. Also raised by @zssherman.

Discussion points

This is a policy document — language matters more than usual. Feedback especially welcome on:

Whether the "CI / Packaging / Dependency" framing is the right level of strictness.
Whether to require (not just recommend) disclosing the AI tool/model.
Whether the examples in "Not Acceptable / Acceptable" need xradar-specific replacements.
Whether the file should live at docs/ai_policy.md or somewhere else (e.g. a new docs/contribute/ subfolder to mirror xarray's layout).

CC: @kmuehlbauer @syedhamidali @mgrover1 @egouden @zssherman @scollis @rcjackson @jrobrien91

Test plan

File renders cleanly in GitHub's markdown preview.
Sphinx build picks up ai_policy in the toctree (cd docs && make html).
Any maintainer happy to sanity-check the policy language before we take it out of draft.

@zssherman

Adds docs/ai_policy.md, adapted from xarray's policy (doc/contribute/ai-policy.md) with attribution, and xradar-specific additions: - "CI, Packaging, and Dependency Changes" subsection requiring an issue-first discussion before any AI-assisted PR touches GitHub Actions, dependencies, pyproject/environment files, pre-commit config, or security-sensitive areas. Rationale: supply-chain risk, raised by @zssherman in openradar#354. - "Disclosing AI Usage" section recommending (not requiring) that PR descriptions note the tool/model and version when AI was used. Also: - Short "AI Usage Policy" pointer section in CONTRIBUTING.md linking to the full policy. - Adds ai_policy to docs/index.md toctree. - History entry (PR number placeholder). Discussed in openradar#354.

Reframe "Large AI-Assisted Contributions" -> "Prefer Small PRs and Open an Issue First". Drop the absolute 2,000-line example in favor of review burden as the criterion (a 100-line change can also be too dense to review in isolation). Add an explicit "strongly encouraged" issue-first step so maintainers can validate scope and structure before any code is written.

codecov · 2026-04-20T17:40:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.85%. Comparing base (464b4dc) to head (0bd4d9d).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #363   +/-   ##
=======================================
  Coverage   93.85%   93.85%           
=======================================
  Files          28       28           
  Lines        6165     6165           
=======================================
  Hits         5786     5786           
  Misses        379      379

Flag	Coverage Δ
notebooktests	`0.00% <ø> (ø)`
unittests	`93.85% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zssherman · 2026-04-21T16:37:28Z

This looks great! Thanks for putting this together!

kmuehlbauer · 2026-05-07T06:41:52Z

I want to bring this to attention, the maintainers of GDAL are currently discussing restricting AI/LLM usage: OSGeo/gdal#14500

Also worth of checking arguments in gdal-dev list: https://lists.osgeo.org/pipermail/gdal-dev/2026-May/061592.html.

Particular interesting example of AI introducing bloated code, which by reviewing from experienced maintainer wasn't captured:

JP2Grok: new raster driver for Grok JPEG 2000 toolkit OSGeo/gdal#14204 (PR with AI)
Optimize CPLHaveRuntimeAVX2() on gcc and Simplify rasterio.cpp wrt AVX2 use OSGeo/gdal#14236 (fix by maintainer)

There is a nice overview of different AI policies over at https://github.com/melissawm/open-source-ai-contribution-policies.

A nice blog post from Stéfan van der Walt covers many aspects of the ongoing discussions https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/.

A main concern in many of the above is on copyright issues. We do not have anything mentioning copyright or code ownership in the suggested policy. Worth discussing how we should go forward.

egouden · 2026-05-23T13:03:30Z

This is an interesting and difficult topic which redefines our work. Here are a few comments:

The AI world is evolving fast. I think it is is difficult to define rules in too much details. I would use a few principles and review it regularly.
In my experience, the human and the LLM benefit from discussing together the goal and the logic of an issue. The LLM is much more efficient at implementing a solution following standard best practices than the human. Having a human reviewing each line significantly reduces the benefit. Another independent LLM can actually do the review under human control.
It is important to keep the code concise in order to ensure the quality of the LLM output and reduce its cost (the latest big models are actually quite good at this). Important information that is implicitly used by human programmers in this particular project (e.g. the FM301 standard) should be communicated to the LLM together with repo best practices in an agent.md.
Should we allow any kind of LLM provider and usage or should we restrict it following some official security guidelines (at least for the review part)? I expect the use of LLM to be consistent with our license.

kmuehlbauer · 2026-05-26T07:36:39Z

As mentioned before, GDAL, a cornerstone project in geospatial (MIT license), just adapted their AI policy, restricting AI usage to the minimum.

We should be very careful in adopting AI usage, especially for the following points:

Copyright Infringement & "Code Laundering"
Open-Source Licensing Violations (Copyleft)
Ownership and IP

Can we legally prove that a third party doesn't actually own LLM generated code which is about to be merged into our code base? Is it possible at all to check if LLM generated code is not violating any licenses? Who owns the LLM generated code, if authorship requires human creation?

egouden · 2026-05-26T08:53:21Z

I created an experimental PR using AI for both implementation and review to help the discussion (#383).

rcjackson · 2026-05-26T11:28:07Z

@kmuehlbauer I don't think there is any practical way to enforce copyrights outside of requiring that the human submitting the PR attest thath they have not violated copyright (which anyone can just say "no").

kmuehlbauer · 2026-05-26T13:29:38Z

@rcjackson Yes, sure, we are trusting each contributor that the submitted code is of their own origin and/or can be ingested legally. But, given that we know LLM's have been trained on FOSS and any other available code, how can the human submitting the PR attest that anyhow? How should the human know, if the LLM reproduced a copyrighted part of training data? We would be shifting our trust in humans into trusting black boxes.

Who is the author of that particular generated code? Is it the human submitter, or does it belong to the public domain? There are so many unanswered questions with regard to LLM generated code, that I find it at least difficult if not impossible to safely use it.

rcjackson · 2026-05-26T13:39:35Z

@kmuehlbauer there is no guarantee that a human-only PR also does not violate copyright standards either. If you are going to apply that standard to any PR. A human can also do the same thing, and often does learn from FOSS software. If you apply your standards, then no PR can ever be trusted, human or AI.

kmuehlbauer · 2026-05-26T14:36:48Z

@rcjackson Thanks, that's all valid. And, true, there is no such guarantee. A human contributor may be imperfect, but they are still a legible and accountable source. An LLM is not. If we treat both as equivalent, we are not being consistent, IMHO.

The only escape hatch is to trust the human submitter of LLM generated code as if it would be entirely their own code. Going that path still does not account for any future legal developments.

Yes, maybe I'm too cautious and unsure about it. But, obviously others feel the same if we look how the different AI policies are evolving across projects and organizations. There is still no real consensus on how to properly classify or govern LLM generated contributions. That lack of consensus itself is already a signal that the problem is not settled at all.

I'm with those who converge on caution rather than assuming equivalence between human and LLM generated code.

Would be great to hear more opinions. Should we be more strict or more permissive here? What concrete criteria would make LLM assisted contributions clearly acceptable without weakening our expectations around authorship, accountability, and provenance?

aladinor added 3 commits April 20, 2026 12:03

DOC: history.md: link AI Usage Policy entry to PR openradar#363

0bd4d9d

This was referenced Apr 20, 2026

MNT: Improve Contributing Guide #354

Merged

REL: 0.12.0 #359

Merged

aladinor mentioned this pull request Apr 20, 2026

CI: main-branch TestPyPI upload fails due to missing id-token: write permission #364

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: AI Usage Policy (draft for discussion)#363

DOC: AI Usage Policy (draft for discussion)#363
aladinor wants to merge 3 commits into
openradar:mainfrom
aladinor:docs/ai-usage-policy

aladinor commented Apr 20, 2026

Uh oh!

codecov Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

zssherman commented Apr 21, 2026

Uh oh!

kmuehlbauer commented May 7, 2026

Uh oh!

egouden commented May 23, 2026 •

edited

Loading

Uh oh!

kmuehlbauer commented May 26, 2026

Uh oh!

egouden commented May 26, 2026

Uh oh!

rcjackson commented May 26, 2026

Uh oh!

kmuehlbauer commented May 26, 2026

Uh oh!

rcjackson commented May 26, 2026

Uh oh!

kmuehlbauer commented May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

aladinor commented Apr 20, 2026

Summary

What's in the draft

Structure

Discussion points

Test plan

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zssherman commented Apr 21, 2026

Uh oh!

kmuehlbauer commented May 7, 2026

Uh oh!

egouden commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmuehlbauer commented May 26, 2026

Uh oh!

egouden commented May 26, 2026

Uh oh!

rcjackson commented May 26, 2026

Uh oh!

kmuehlbauer commented May 26, 2026

Uh oh!

rcjackson commented May 26, 2026

Uh oh!

kmuehlbauer commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Apr 20, 2026 •

edited

Loading

egouden commented May 23, 2026 •

edited

Loading

kmuehlbauer commented May 26, 2026 •

edited

Loading